Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigquery #455

Merged
merged 53 commits into from
Jun 19, 2017
Merged

Bigquery #455

merged 53 commits into from
Jun 19, 2017

Conversation

drewbanin
Copy link
Contributor

@drewbanin drewbanin commented Jun 7, 2017

-- profile.yml
bq:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: oauth
      project: 'project_name'
      schema: "your_schema" # dataset
      threads: 1

Copy link
Member

@cmcarthur cmcarthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in addition to comments here, this needs:

  • unit tests on completely new functionality (adapter task)
  • integration tests on BQ

@classmethod
def get_bigquery_credentials(cls, config):
method = config.get('method')
Creds = google.oauth2.service_account.Credentials
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't upcase variable names

Creds = google.oauth2.service_account.Credentials

if method == 'oauth':
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if they use the oauth method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We initialize a bigquery client with the credentials argument if it's a service account (from a private key file or the private key raw json).

If the credentials object is None (as with oauth), then the bq client lib will use the oauth flow to authenticate the user. We force this flow if the user isn't authenticated by shelling out to gcloud here

More info on gcp auth flows here

return Creds.from_service_account_info(details)

error = ('Bad `method` in profile: "{}". '
'Should be "oauth" or "service-account"'.format(method))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or service-account-json?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can we put this validation into a voluptuous schema for the bigquery profile?

from_name, to_name)
raise dbt.exceptions.NotImplementedException(message)

# hack because of current API limitations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate more? if there's a link explaining why we need this, you should add it here

Copy link
Contributor Author

@drewbanin drewbanin Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's actually not super well documented which is part of the problem. BQ defaults to the "Legacy SQL" syntax when views are created via the API. You can set a useLegacySql = False flag on the Table object, but it looks like it's presently ignored by the API.

Relevant issue

The best workaround i could find was to use the #standardSQL header line

_, cursor = cls.add_query(profile, sql, 'master')
res = cursor.fetchone()

logger.debug("Cancel query '{}': {}".format(connection_name, res))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior on ctrl+c here?

if we need to just raise an exception, you should remove all this code.

also please change this and the exception for add_query to dbt.exceptions.NotImplementedException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmcarthur hadn't gotten there yet :)

Looks like we can actually issue a cancel command for running queries: https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/cancel

Copy link
Contributor Author

@drewbanin drewbanin Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmcarthur it doesn't look like it's possible to cancel a table/view creation request. Jobs can be canceled though, which may be good to keep in mind for the future. This branch will log an error that looks like this on ctrl-c:

The bigquery adapter does not support query cancellation. Some queries may still be running!

for package, require in cls.requires.items():
logger.info("Installing {}".format(require))
pip_main(['install', require])
logger.info("Installed {} successfully!".format(require))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a unit test for this? i want to be sure it works on all platforms

@@ -33,10 +33,19 @@
Optional('role'): basestring,
})

bigquery_credentials_contract = Schema({
Required('method'): basestring,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All(basestring, Any(['oauth', 'service-account', 'service-account-json']))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can i just do one of these:

Any(['oauth', 'service-account', 'service-account-json'])

@@ -44,6 +43,7 @@


def make_log_dir_if_missing(log_dir):
import dbt.clients.system
dbt.clients.system.make_directory(log_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move this import in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logger imports the system client, and the system client imports the logger. This breaks the circular import


dbt.adapters.postgres.PostgresAdapter.drop.assert_not_called()

mock_adapter_truncate.assert_not_called()
mock_adapter_rename.assert_not_called()
mock_adapter_rename.assert_called_once()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes me nervous. was this not working properly before? my understanding that view models would just be created with the final name if they did not previously exist (vs created and then renamed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm really dissatisfied with our current drop/create/rename logic. It works ok, but that __dbt_tmp suffix makes a lot of coordination between dbt components unduly difficult. I think this was an attempt at simplifying how it works, but let me revisit. Good catch

@drewbanin
Copy link
Contributor Author

@cmcarthur this is pretty much good. I did change the way views are materialized (temp table, then renamed), so the changed unit test is warranted imo.

I have a single integration test for BQ which passes on circle and appveyor.

Think this is ready for another look!

Copy link
Member

@cmcarthur cmcarthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewbanin this looks ok to ship. i did add one more comment.


@classmethod
def get_status(cls, cursor):
raise Exception("Not implemented")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotImplementedException // remove code below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch 👍

@drewbanin drewbanin merged commit 8b8b3aa into development Jun 19, 2017
@drewbanin drewbanin deleted the bigquery branch June 19, 2017 19:21
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* wip

* minimum viable bq adapter

* proper bq quoting

* absolute madness

* auto-install bq module

* catch runtime exception for nodes

* cleanup

* more cleanup

* pip critical logging

* add adapter command

* major wip

* refactorin

* closer

* handle model errors + dependent skips

* cleanup + test transactions (for now)

* move model creation to materializers

* fix for ephemeral models

* override materializer for bq

* error handling

* bq tests working

* commit tweaks for models

* service accounts

* service account json for bq

* better error message if adapter is not installed

* fix unit tests

* pep8

* fix integration tests

* codeclimate

* fix typos

* fix circular dep for imports

* catch programming exception for runners

* code review changes

* refactoring for code climate

* selector cleanup

* fix bug for erin

* comment

* handle cancellation on ctrl-c for bq (log warning)

* better bq validation

* test bq validation

* add uninstall flag to adapter task

* remove pip hacking nonsense

* bq integration tests

* remove initialize call for pg

* fix bq integration tests

* pep8

* remove -x opt from toxfile

* handle notimplemented for bq better

* missing import for seed task

* notimplemented for bq


automatic commit by git-black, original commits:
  8b8b3aa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants